-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Episode iterator upgrades #216
Conversation
…ch/habitat-api into episode-iterator-upgrades
…ch/habitat-api into episode-iterator-upgrades
@JasonJiazhiZhang feel free to comment on the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing this to master. Checked the logic of EpisodeIterator
it's not so easy to follow. Maybe in future we would like it to have separate easier to understand subclasses of EpisodeIterator
.
Requested some changes.
habitat/core/dataset.py
Outdated
self._rep_count = 0 | ||
self._step_count = 0 | ||
|
||
self._switch_scene_if() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_episode
is not updated based on results of self._switch_scene_if()
.
habitat/core/dataset.py
Outdated
def _set_shuffle_intervals(self): | ||
if self.max_scene_repetition_episodes > 0: | ||
self._max_rep_episode = random.randint( | ||
int(0.8 * self.max_scene_repetition_episodes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create repetition_rand_interval = 0.2
param for EpisodeIterator
and do:
self._max_rep_step = random.randint(
int((1 - self.repetition_rand_interval) * self.max_scene_repetition_steps),
int((1 + self.repetition_rand_interval) * self.max_scene_repetition_steps),
)
Or even create util method:
@static
def _randomize_value(value, interval):
return random.randint(
int((1 - interval) * value),
int((1 + interval) * value),
)
@@ -213,6 +213,11 @@ def _update_step_stats(self) -> None: | |||
if self._past_limit(): | |||
self._episode_over = True | |||
|
|||
if self.episode_iterator is not None and isinstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While self.episode_iterator
mentioned as Optional I don't see Env functioning without it here. Maybe check for subclass of EpisodeIterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate PR probably makes sense for changing that functionality (dataset is also marked as optional).
habitat/core/dataset.py
Outdated
do_switch = True | ||
|
||
if do_switch: | ||
self._shuffle_iterator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for self.shuffle == True
, here when we shuffling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. _shuffle_iterator
is used both to initiate a scene switch and to just the shuffle the episodes. self.shuffle
decides whether or not to shuffle the episode order on cycle or on load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange as there was no option to enable cycling
without episode shuffling
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cycle logic doesn't rely on the shuffle
… episode-iterator-upgrades
…ch/habitat-api into episode-iterator-upgrades
habitat/core/dataset.py
Outdated
:param num_episode_sample: number of episodes to be sampled. :py:`-1` | ||
for no sampling. | ||
""" | ||
self._repetition_rand_interval = 0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move repetition_rand_interval
to init
argument with default value, otherwise no option to turn it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on following up the comments. Last item:
We are getting shuffle automatically if max_counts
start working.
Maybe more logical will be to jump to next scene episodes without shuffling if shuffle isn't enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mathfac for looping me in and thank you @erikwijmans for implementing this! I have some minor comments. Overall looks great to me!
habitat/core/dataset.py
Outdated
else: | ||
self._max_rep_step = None | ||
|
||
def _switch_scene_if(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question here: would it be switching to frequently at the later stage of training? with potentially less than 100 episode per scene switch? Will it help to incorporate both count schemes, like a logical AND?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shuffling on steps is consistent from an optimization standpoint, you swap scenes every N parameters updates, which is why I like it :)
I don't think you can switch scenes "too often". Ideally, we'd just randomly sample a new episode irrespective of the scene it is in, but this incurs the non-trivial cost of swapping the scene way too often.
… comments to assert messages.
Pulled PR to understand the logic of changes. As result added some tests and small fix. |
habitat/core/dataset.py
Outdated
@@ -376,9 +377,6 @@ def _switch_scene(self) -> None: | |||
|
|||
if len(grouped_episodes) > 1: | |||
# Ensure we swap by moving the current group to the end | |||
if self.shuffle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this shuffle deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because, shuffles are already happening at the beginning and on each cycle. That's why this shuffle is redundant and can be omitted with no test failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess both logics are valid. Either you cycle through all scenes in a predetermined but random order, or you randomly draw a new scene from the set of other scenes. One doesn't seem inherently better than the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that it's already in predetermined but random order
because of self.shuffle
in other places. So there is no need to shuffle again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep yep, it's just different ideas of how scenes should be selected on a swap. I am fine with this way if it makes more sense to you, both make sense to me.
* Episode iterator upgrades * Make default slightly lower * Fix env.py * fix is true * Update docstring * Fix formatting * Turn off shuffle! * Changes * Fix test and incorperate comments * Don't always shuffle on scene swap * Rename and add doc string * Cleanup * Simplify shuffle logic * Fixed small bug with initial rep_count value, added more tests, moved comments to assert messages.
Motivation and Context
Add some upgrades to the episode iterator functionality. The primary thing this adds is the ability to shuffle scenes after some number of steps are taken. If you are shuffling after some number of episodes have ended, you end up shuffling too often at the start of training and not enough towards the end. This value is set to 10k by default. I consider this to be a "high but sane" default value. Not so low that it will dramatically slow-down training, but not so high that scenes will never be swapped.
Also randomizes the shuffle switch values to be within
[0.8, 1.2]
of the configured value. This helps not have all workers swap the same point for the shuffle by steps mode.Adds an initial shuffle to the list of episodes as otherwise the same scene will always be initially loaded (which is bad).
Sets the max steps per scene to a high but sane default value. We generated a huge number of episodes per scene for training (50k IIRC) and you basically never switch scenes toward the end of training with that.
Also makes the scene order get shuffled by default as this makes sense for the default (people rarely change the defaults, so they should be set reasonably).
How Has This Been Tested
Via the tests
Types of changes